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SOURCE CODING ENHANCEMENT USING SPECTRAL-BAND REPLICATION 



TECHNICAL FIELD 

In source coding systems, digital data is compressed before transmission or storage to nduce the required bitiate or 
staling capacity. Hie present invention relates to a new method and apparatus for the improvement of source coding 
systems by means of Spectral Band Replication (SBR). Substantial bitrate reducUon is achieved while maintaining 
tbe same perceptual quality or conversely, an impnivcment in perceptual quality is achieved at a given bitrate. TTiis 
is accomplished by means of spectral bandwidth reduction at the encoder side and subsequent spectral band 
itpiication at tbe decoder, whereby the iiwention exploits new concepts of signal redmidancy in the spectral domaia 

BACKGROUND OF THE INVENTION 

Audio source coding techniques can be divided imo two classes: natural audio coding and speech coding. Natural 
audio coding is commonly used for music or arbitrary signals at medium bitrates. and generally offers wide audio 
bandwidth. Speech coders are basically limited to speech reproducUon but can on the other hand be used at voy low 
Wtrates. albeit with low audio bandwidth. Wideband speech offers a major subjecUve quality improvement over 
nanow band speech. Increasing the bandwidth not only improves intelUgibiUty and natmatoess of speech, but also 
CiciUtates speaker recognition. WidebamI speech coding is thus an important issue in next generation telephone 
systems. Further, due to the tremendous growth of the multimedia field, transmission of music and other non-speech 
signals at hi^ quality over tel^hone systems is a desirable feature. 

A high-fidelify linear PCM signal is very inefficient in terms of bitrate versus the perceptual entropy. Tl^e CD 
standard dictates 44.1 kHz sampling frequency. 16 bits per sample resolution and stereo. TOs equals a biuate of 
141 1 kbit/s. To drastically reduce the bitrate. source coding can be performed using spUt-band perceptual audio 
codecs. Ttesenaturalaudiocodecsexploitperceptualinelevancyandstatistf^ ^^^^^ 
best codec technology, approximately 90% data reduction can be achieved for a standard CD-format signal with 
pncticaUy no perceptil,ledegradation. Very Wghsonnd quality instereo is thusposs,T,te ^ a 

c««npression fector of approximately 15: 1. Some perceptual codecs offer even higher compression ratios To 
adueve this, it is common to reduce the sample-rate and thus the audio bandwidth. It is also common to decrease the 
mm*er of quantization tevels. aUowing occasionally audible quantization distortion, and to employ degradation of 
U« stereo field, through imenshy coding. Excessive use of such methods results in am.oying perceptual degradaUoa 
Onnart codec technology is near saturation and fiuther pr.^ in coding ^ 
tbe coding performance fiudier. a new approach is necessary. 

•n« human voice and most musical instrmnents generate quasistationary signab that emerge from oscillating 
systems. According to Fourier theonr. any periodic signal may be expressed as a sum of sinusoids with the 
fiequemaes/ 2f. 3/ 4/ 5/etc. where/is the fundamental frequency. The frequencies form a harmonic series A 
batKlrndth limitation of such a signal is equivalent to a truncation of the harmonic series. Such a truncation alters the 
^ T"' T-f^' " ""^^ " ^^^^ an audio signal that will sound "muffled" 
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Prior art methods are mainly intended for improvement of speech codec performance and in paiticular intended for 
High Frequency Regeneration (HFR), an issue in ^jeech coding. Such methods employ broadband linear frequency 
shifts, non-linearities or aliasing [U.S. PaL 5.127,0541 generating intermodulaUon products or other non-harmonic 
fiequency components which cause severe dissonance when applied to music signals. Such dissonance is referred to 
5 in the speech coding litecatuie as "harsh" and "rough" soundiqg. Other synthetic speech HFR methods generate 
sinusoidal hannonics that are based on fundamental pilch estimation and are thus limited to tonal stationary sounds 
[U.S. Pat 4,771,465]. Such prior art methods, although useful for low-quality speech appUcations. do not woric for 
high quaUty speech or music signals. A few methods attempt to improve the performance of high quaUty audio 
source codecs. One uses synthetic noise signals generated at tiie decoder to substitute noise-like signals in speech or 

10 music previously discarded by the encoder rimproving Audio Codecs by Noise Substitution" D. Schultz, JAES. 
Vol. 44, No. 7^, 1996J. This is performed within an otherwise normally tiahsmittedhighband at an intatnittent 
basis when noise signals are present Another method recreates some missing highband hannonics that were lost in 
Uie coding process TAndio Spectral Coder" A.J.S. Feneira. AES Preprint 4201. 100* Convention, May 1 1-14 
1996, Copenhagen] and is again dependent on tonal signals and pitch detection. Both methods operate at a low duty- 

15 cycle basis ofTeiing comparatively limited coding or peiformance gain. 

SUMMARY OF THE INVENTION 

The present invention provides a new metiiod and an apparatus for substantial improvements of digital souioe 
coding systems and more specifically for the improvements of audio codecs. The objective includes bitrate 

20 reduction or inqaoved percq)tiial quah^ or a combination tiiereof. The invention is based on new metiiods 

exploiting harmonic redundancy, ofifering tiie possibility to discard passbands of a signal prior to transmission or 
storage. No perceptual degradation is perceived if the decoder performs high quality spectral repUcation according 
to tiie invention. Tbn discarded bits represent tiie coding gam at a fixed pereeptiial quality. Alternatively, more bits 
can be allocated for encoding of tiie lowband information at a fixed bitrate. du»eby achieving a higher peiceptiial 

25 quaU^. 

The present hwention postolates tiiat a truncated harmonic series can be extended based on tiie direct relation 
between lowband and highband spectral components. TTiis extended series resembles tiie original in a percepnial 
sense if certain niles are foUowed: First, tiie extrapolated spectral components must be harmonically related to tiie 
30 tnmcaledhannbnic series, in onier to avoid dissonance-rebtedartifects.1^ 

a means for file spectial repUcation pnicess. which ensures flat fliis criterion is me^ 

tfie lowband spectial components form a harmonic series for successful operation, since new replicated components, 
harmonically related to tiiose of tiie lowband. wUl not alter die noise-like or transient nature of tiie signaL A 
transposition is defined as a transfer of partials fiom one position to anotiier on Uie musical scale while maintaining 
35 tiie fiequency ratios of tiie partials. Second, tiie spectral envelope. Le. die coarse spectral distribution, of tiie 

replicated highband. must reasonably weU resemble tiiat of tiie original signaL The present invention offers two 
modes of operation. SBR-l and SBR-2, tiiat di£kr in ttie way die spectial envelope is adjusted. 



40 



SBR-l. intended for tiie improvement of imermediate quality codec applications, is a single-ended process which 
relies exclusively on tiie infonnation contained in a received lowband or lowpass signal at ttie decoder. TTie spectral 
envelope of ttiis signal is determined and extrapolated, for instance using polynomials togetiier witii a set of rules or 
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a codebook Hiis information is used to continuously adjust and equalise the replicated highband. "IHe present SBR- 
1 method ofifers the advantage of post-processing, i.e. no modifications are needed at the encoder side. A 
broadcaster wUl gain in channel utilisaUon or wiU be able to offer improved perceptual quality or a combination of 
both. Existing bitstream syntax and standards can be used without modification. 

SBR-2. intended for the improvement ofhigh quaUty codec appUcations. is a double-ended process where, in 

addition to the transmitted lowband signal according to SBR-1. the spectral em^ope of the hi^^ 
transmitted. Since the variations of the spectral envelope has a much lower rate than the highband signal 
components, only a limited amount of information needs to be transmitted in order to successfully represent the 
spectral envelope. SBR-2 can be used to improve the performance of current codec technologies with no or minor 
modifications of existing syntax or protocoUs. and as a vahiable tool for fiiture codec development 

Both SBR-l and SBR-2 can be used to repBcale smaller passbands of Oie lowband when such bands are shut down 
by the encoder as stipulated by thepsychoacoustic model muter bit-starved conditions. This results in improvement 
of the perceptual quality by spectral replication within the lowband in addition to spectral repUcation outside the 
lom^nd. Further. SBR-l and SBR-2 can also be used in codecs employing Wtrate scalabiBty. where the pereeptual 
quahty of the signal at the receiver varies depending on transmission channel conditions. TTususually implies 
annoying variations of tiie audio bandwidth at the receiver. Under such conditions, the SBR methods can be used 
successfuUy in order to maintain a constantly high bandwidth, again improving Uie peicephial quality. 

The presem invention operat«onacontinnous basis, replicating aiv type of signal contena^^ 

(noise-like and transient signals). In addWon, the present specUal repUcation method creates a perx^ptually accurate 

r^hca of the discarded bands fix,™ available fi^quencgr bands at the decoder. 

substantially higher level of coding gain or pereeptual quality impmvement compared to prior art methods The 
mvenaon can be combined with such prior art codec improvement methods; however, no performance gain is 
e,q>ccted due to such combinationsL 

The SBR-method conq»ises the following steps: 

- «»««««««rf««gnalderivedfremanoriginalsignal.whereftequency^^ 
the duicanhng is performed prior to or duririg encoding, forming a first signal. 

- '''^S"«ft«decodingofthefir«sig„al,transposingfiequencybandsrf 
signal. 

- performing spectral envelope adjustment, and 

- combining the decoded signal and die second signal, forming an output signal. 

The passbands Of the second signal may be set ™^ 

in^orbesetmdependence of the temporal Characteristics^ 

chamid condmons. Tlie spectral em^lope adjustmem is performed based » 

envelopeflomsaidfirstsignal r on tramanittedem^dope information of the riginal signal. 
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Hie present invention includes to basic types of transposeis: muldband transposes and timc-variant pattern seaich 
prediction transposeis, having different properties. A basic multiband tiansposiUon may be performed aocoiding to 
the piesent invention by the following: 

- filtering the signal to be transposed through a set of ^ ^ 2 bandpass filteis with passbands comprising the 
firequencies K^..^ respectively, forming;^ bandpass signals, 

- shiftily the bandpass signals in fiequency to regions comprising the frequencies Ki Vu .^J^\ wheie M^Ws 
the transposition fiictor, and 

- combining the shifted bandpass signals, forming the transposed signal. 

Alternatively, this basic multiband transposition may be performed according to the invention by the following: 

- bandpass filtering the signal to be transposed signal using an analysis filteibank or transform of such a nature 
that real- or complex*valued subband signals of lowpass type are generated, 

- an arbitiaiy number of channels k of said analysis fiUerbank or transform are connected to channels A^, A/ ^ 
1, in a synthesis filteibank or transform, and 

- the tran^sed signal is formed using the synthesis filterbank or transform. 

An improved multiband transposition according to the invention incorporates phase adjustments, enhancing the 
performance of the basic multiband transpositioa 

Hie time-variant pattern search prediction transposition according to die present invention may be performed by tiie 
following: 

- perfomung transient detecticm on the first signal, 

- determining which segment of the first signal to be used when duplicating/discarding parts of the first signal 
depending on the outcome of the transient detection, 

- adjusting statevector and codebook properties depending on the outcome of the transient detection, and 

- searching for synchronisation points in chosen segment of Oie first signal, based on the synchmnisation point 
found in the previous ^^nchronisation point search. 

The SBR methods and apparatuses according to die present invention offer the foUowing features: 

1. TTie metiiods and apparatuses exploit new concepts of signal redundancy in die spectral domain. 

2. ITie methods and aqypaiatuses are ^licable on arbitrary signals. 

3. Each harmonic set is individually created and controlled. 

4. AUrepUcated harmonics are genenited in such a manner as to form a conto 
series. 

5. The spectral repUcation process is based on transposition and creates no or imperceptible arti&cts. 

6. The spectral replication can cover multiple smaller bands and/or a wide frequency range. 

7. In the SBR-1 method, die processing is performed at the decoder side only, Le. all standards and protocols 
can be used without mmiifirafiffn 

8. •n«SBR-2mcthodcanbeimplementedinaccordanoewithmostslandardsandpr^ 

modifications. 

9. The SBR-2 metiiod offers die codec designer a new powerful compression tool. 

10. The coding gain is significant 
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The most attractive appUcation relates to tlic improvement of various types of low bitiate codecs, such as MPEG 1/2 
Layer WI/DI [U.S. Pat 5.040.217], MPEG 2/4 AAC, Dolby AC-2/3. NTT TwinVQ [U.S. Pat 5,684.920], 
AT&T/Lucent PAC etc. The invention is also uselul in high-quality speech codecs such as wide-band CELP and 
SB-ADPCM G.722 etc. to improve perceived quality. The above codecs are widely used in multimedia, in the 
telephone indiistiy. on the Internet as weU as in professional applicaUons. T-DAB (Teirestiial Digital Audio 
Broadcasting) systems use low bitrate protocols that wiU gain in channel utilisaUon by using the present method, or 
improve quality in FM and AM DAB. Saldiite S-DAB will gain considerably, due to the excessive system costs 
involved, by using the present method to increase the number of programme channels in the DAB multiplex. 
Furthermore, for the first time, full bandwidth audio real-Ume streaming over the Internet is achievable using low 
bitrate telephone modems. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will now be described by way of iUustrative examples, not Umiting the scope or spirit of the 
invention, with reference to the accompai^ing drawings, in which: 

Fig. 1 iUustrates SBR incoiporated in a codmg system according to the piesem invention; 

Fig. 2 iUustrates spectral rqjUcation of upper harmonics according to the present invention; 

Fig. 3 iUustrates spectral repUcation of inband harmonics accoiding to the present invention; 

Fig. 4 is a block diagram for a time-domain implementation of a transposer according to the present invention; 

Fig. 5 is a flowchart representing a cycle of operation for the pattern-search prediction transposer accoiding to the 

present inventioi^ 

Fig. 6 is a flow-chart representing the search for synchronisation point accoiding to tiie present invention; 
Fig. 7a - 7b iUusbates the oodebook positioning during transients according to the present invention; 
Fig. 8 is a block diagram for an implementation of several time-domain tiansposeis in connection wiUi a suitable 
filtetbank, for SBR operation according to the present invention; 

Fig. 9a - 9c are block diagrams representing a device for STFT analysis and syntiiesis configured for generation 
of 2 order harmonics according to the present invention; 

Fig. 10a - 10b are block diagrams of one sub-band widi a linear frequency shift in tite STFT device according to 
the present invention; 

Fig. 1 1 shows one sidi-band using a i^iase-multiplier according to the present invention; 

Fig. 12 iUustrates how 3"" order harmonics are generated according to the present invention; 

Fig. 13 Ulustrates how 2"* and 3- order harmonics are generated simultaneously accoiding to the presem 

invention; 

Fig. 14 iUustrates generation of a nonniveri^ing combination of several harmonic orders accoiding to the 
fMesent invention; 

Fig. 15 iUustrates generation of an interleaved combination of several haimonic orders according to the present 
invention; 

Fig. 16 illustrates generation of broadband linear frequency shifts; 

Fig. 17 illustrates how sub-harmonics are generated according to the present invention; 

Fig. 18a - 1 Sb are blodc diagrams of a perceptual codec; 

Fig. 19 shows a basic structure of a maximaUy decimated filteriiank; 
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Fig. 20 musuates generaUon of 2- order hannonics in a 
invention; 

Fig. 21 is a block diagram for the improved multiband transposition in a maximally dedmated filterbank 
operating on subband signals aocordirig to the present invention; 

Fig: 22 is a flowK*art representing the improved multiband transposition in a maximally decimated filterbank 
operating on subband signals accordirig to the present invention; 
Fig. 23 iUustrates subband sanQ>les and scal^ctors of a typical codec; 

Fig. 24 illustrates subband samples and envelope information for SBK-2 according to the present im«nlion- 
Fig. 25 illustates hidden transmission of envelope information in SBR-2 according to the present invention' 
Fig. 26 Illustrates redundancy coding in SBR-2 according to the preseiu invenUon; 
Fig. 27 iUustrates an implementation of a codec using the SBR-1 meO^cxI according to the p.^ invention; and 
Frg. 28 IUustrates an implementation of a coded using the SBR-2 method according to the present invention- 
Fig. 29 ,s a block diagram of a -pseudo^aereo" generator according to the present im^enUoa 

5 DESCRIPTION OF PREFERRED EMBODIMENTS 

Huoughout the explanation of the embodiments herein, emphasis is given to natural audio source codir« 

a^cadons. However Jt Should be understood that the present invention is appUc*^^ 
appfacations other tfian that of encoding and decoding audio signals. 

> Ttansposi tion hasifx 

lYan^tion as defined according to die present invention, is the ideal method for spectral replication, and has 

se^maM advantages overpriorart.sodr as: nopitchdetecti^ 

pitched and po^yphomc programme material is Obtained. a«l the tra^ 

tonal si^s. Conuary to other methods, the transposition according to the h^^^ 
source coding systems for aibitraiy signal Qfpes. 

Anexaa tran^tionafiK:torMpfadiscre.e time Signal x(„)in 
anqilttudes. is defined by the relation 

N-l 

*(/i) = 2e,(/t)cos(2;jr,/>//,+a,) 



(2) 



where^isthemunberofsinusoids.heieafterreferxedtoaspartiah,>;..^^^^ 

Omeenvdopesandphaseconstantsn^pectivdy.Aarethearbitraryou ' 
frequency, and OS A</ls^. csampung 

I-Fig2 the generation Of A^order harmonics. whereA/isan^^^ 

^forsmiphcity.albeittheprocessgenemtesA^ harmonic, to all signals in a certain fi^uen^.^^ 
whrchmmostcasesarethemseh^ harmonics of unknown order, llieinputsi^ ^ 
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representation JrCO is bandlimited to the range 0 to/^ 201. The signal contents in the range/^Aiy to Qf^lKf, 
where e is the desired bandwidth expansion fector 1< g ^M, is extxacted by means of a bandpass filter, fining a 
bandpass signal with spectnim A-^pCO 203. TTie bandpass signal is transposed a factor K4, fonning a second 
bandpass signal with spectnim^K/) covering the range/_ to Qf^, 205. TTie spectral envelope of this signal is 
adjusted by means of a programme^tioUed equaliser, forming a signal with spectrum Ai(/) 207. This signal is 
tiien combined with a delayed version of the input signal in order to compensate for the delay imposed by the 
bandpass mter and transposes wherAy an output signal with spectrum r( /) covering the ^ 
209. Alternatively, bandpass filtering may be performed after the transposiUon M, using cut-off frequencies/^ and 
e/U. By using multiple transposers, simultaneous generation of different harmonic orders is of course possibte. 
TTie above scheme may also be used to "fill in" stopbands wiUiin the input signal, as shoivn in Fig. 3. where the 
input signal has a stopband extending from/o to 03 301. A passband WM,Qf^lK1\ is then extracted 303. transposed 
a fiiclor A/to K .QToI 305. envelope adjusted 307 and combined with the delayed input signal forming the output 
signal with qiectrum Y{J) 309. 

An approximation of an exact transposition may be used. According to the presem invention, the quaUty of such 
approximations is detennined using dissonance theory. A criterion for dissonance is presented by Plomp fTonal 
Consonance and Critical Bandwidth" R. Plomp. W. J. M Levelt JASA . Vol 38. 1965]. and states that two partials 
are considered dissonam if the frequency difference is within approximately 5 to 50% of the bandwidth of the 

critical bami in whidi the partials are situated For reference, the critical bandwidth foragivcnfr«^ 
^jproximatedby 

c*CO = 25 + 75a+1.4(-i-)2)'»«' 

1000 ' (3) 

with/ami eft in Hz. Further. Plomp states that the hmnan auditory system can not discriminate two partials if they 

diffo in fireq«m«y by approximately less than five percent of the critical band^^^ 
exaa transposition ill Eq. 2 is iqjpioximated by 



N-\ 



<=0 



(4) 



where A/; is the deviation fiom the exact transposition, ff the input partials form a harmonic 
the invention states that the deviations fiom the harmonic series of the transposed partials must not exceed five 
percem of the critical bandwidth in which they are situated. ITus would explain why prior art methods give 
imsatisfectory "harsh- and "rough" results, since broad band Unear frequency shifts yields a much larger deviaUon 
than acceptable. When prior art methods produce more than one partial for only one input partial, the partials must 
nevertheless be within the above stated deviation limit, as to be pereeived as one partial ITus again explains the 
poor results obtained with prior art methods using nonlinearities etc. since they produce intermodulation partials not 
within the limit of deviation. 



When using the above transposition based method of spectral replication acconiing to the present invemion. the 
foil wing important properties are adiieved: 
- Normally, no frequency domain overiap occur between replicated harmonics and existing partials. 
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- n.e xepli<^ted partials hannonicaUy .dated .o Ae partials of the input signal and wiU not give rise to any 
annoying dissonance or artifacts. ^ 

- spectial envelope of the replicated hannonics fonns a smooth continuaUon of the input signal spectnd 
envelope, perceptually matching the original envelope. 

Transposition haspd on time^riam p aH em searA p f««rt;»« 
U»f„«^ such „=.hod» are .Heu, 0. 

speech sip»u,^„^^„^^^^^ 

ta»^o..»o»^.««osca«„h*»,.,. G.K,«..w.B.Ki*.ME5. „«,.-n.s,s.i^„.^ Jsyt^ 

whc^ 0^ s.^^ ,s divided !«,» s»au pa-U. used „ s,^ 

^e. »en.«s used fc™ .he o^pu. Signal depeiKtaH o. a« 
n«a»t,.,^hj*,,u.h^„:.«p„si«on«^ 

noiMonal)iiiidlmisieiasouiids.MaIowco«ipoaiionalcosl. "™JltMialor 

R«fe™.« d. dr^iugs „h.^ mce »„„=^ i„^ 

. «ns.««^ 401, , ^„d„„ p<^«„. «,3_ , « ^ 

— ^^^^ 

.s»^«rf«»^segu»,isp„«,„ceiOd»™sed»a,deboolclsse«„*eu^ 
'*ch«u„,s..eu,s,.ch™is,d»p«,Uo.-,^,».„.,„seg™„ 

ou«™seg«.«i.U.,™,.««h*41J..dsuhse,„..U,d»™s«^Wi.,^ «.«.««P»v.o.s 

"p.=sa,.d»,.,p„.»dou,pu,.ig„,l,^i^^^i,,.,^^ ~m,ors™«les. 
i(n)=(r(»XJ*i-Z)Xx(ii-2C),..,.<,-(«.i)0)] 
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which is obtained from U delayed samples of the input signal, where is tlie dimension of the state vector and D is 
the delay between the input samples used to build the vector. The granular mapping yields the sample x(n) following 
each statevector x(n-l). This gives Eq. 6, where a(0 is the mapping: 

x(/f) = fl(i(/i-l)). (5) 

In the present method the granular mapping is used to determine the next output based on the former output, using a 
state transition codebook. The coddK>ok of length L is continuously rebuilt containing the statcvectors and the next 
sample following eadi statevector. Each statevector is separated from its neighbour by K samples; this enables the 
system to adjust the time resolution depoiding on the diaractehstics of the currently processed signal, where K 
equal to one represents the finest resolution. The input signal segment used to build die codebook is chosen based on 
the position of a possible transient and the synchronisation position in the previous codebook. 

This means that the mapping a(0, theoretically, is evaluated for all transitions included in tlie coddxiok: 



x(/i-L) 
x(/i-L+A:) 

x(«-l) 



x(/i-L+l) 
x(n-L+A:+l) 

x{n) 



(7) 



With this transition codebook, the new output y(n) is calculated by searching for the statevector in the codebook 
most similar to the current statevector y(n-l). This nearest-neighbour search is done by calculating the minimum 
difference and gives the new output sample: 

>'{«) = fl(y(n~l)). (8) 

However, the system is not limited to work on a sample by sample basis, but is prcf^ably operated on a segment by 
segment basis. The new output segment is windowed and added, mixed, with the previous output segment, and 
subsequently down sampled The pitdi transposition fector is determined by the ratio of the input segment length 
represented by the codebook and the output segment length read out of the codebook. 

Returning to the drawings, in Fig. 5 and Fig. 6 flowcharts are presented, displaying the cycle of operation of the 
transposer In 501 the input data is represented, a transient detection 503 is peiformed on a segment of the input 
signal; the search for Uansients is p^ormed on a segment length equal to Uie output segment length. If a transient is 
found 505, the position of the transient is stored 507 and the parameters L (representing the codebook length), K 
(rqnesenting Oie distance in samples between each statevector), and D (representing the delay between samples in 
each statevector) are adjusted 509. The position of the transient is compared to the position of the previous output 
segment 51 1. in order to det^mine wh^er the transient has been processed If so 5 13, the position of the codebook 
(window L\ and the parameters K,L, and Dare adjusted 515. After the necessaiy parameter adjustmmts, based on 
the outcome of the transient detection, the search for a new syndironisation, or splicing point takes place 5 1 7. This 
procedure is displayed in Fig. 6. First a new synchronisation point is calculated based on the previous 60 1, 
according to: 

Sync j>os = Sync j>osj>ld + S-M-S^ (9) 

v/hsre Sync^pos and Syncopes jold are the new and old synchronisation positions respectively, ^ is the length of the 
input segment being processed, and A/ is the Uansposition factor. Tlus synchronisation point is used to compare the 



wo 98/57436 



10 



PCT/IB98/00893 



accuracy of the new splicing point with the accuracy of the old spUcing point 603. If the match is as good as or 
better than the previous 605. this new synchronisation point is returned 607 provided it is within the codebook. If 
not, a new synchronisation point is searched for in the loop 609. TTus is perfbraied wiO, a similarity measure, in this 
case a minimum difference fimcUon 611. however, it is also possible to use correlation in the Ume- or frequency- 
domain. If the position yields a better match than that of the previous position found 613 the synchronisation 
positionisstoied615.W»enallpositionsaretried617thesystemreturns619lotheflowc 5 The new 

synchn,nisation point obtained is stored 5 1 9 and a new segment is nad om fh,ra die codd^ 
gn«n sgrnchronisation point TTus segment is windowed and added to t^^ 
tranqxtsition fector 525, and stored in the output buffer 527. 

In Fig; 7 the behaviour of the system under transient conditions reganling the posidon of the codebook is ilhistrated. 
Pnor to the transient, the codebook 1 representing the input segment 1 is positioned "to the left" of segment 1 
Correlation segment 1 represents a part of the previous output and is used to find synchronisation point I in 
codebook 1 . When the transient is detected, and the point of the transient is processed, the codebook is moved 
aocordmg to Fig. 7a and is staUonaor mttil the input segmem currenUy being processed is once again "to the right" 
of the codebook. TTus makes it impossible to duplicate the transient since the system is not allowed to search for 
syn^ironisation points prior to the transient 

Most pitch transposers. or time expanders, based on pattern search prediction give satisfacton^ results for speech and 
smgleDttchedmateriaL However, their performance deteriorates rapidly for high complexity signals, like musfc. in 
particular at large transposition fectors. .-n« present invenUon offers several solutions for improved performance 
therefore producing excellent results for any type of signal Contrary to other designs, the system is time-variant 'and 
the qrstem parameters are based on the prq^ties of the inpmdgnal, and the para^ 

operauon cycle, m use of a transient detector controlling not only the codebook size and position, but also the 
properties of the statevectors included, is a very robust and computationally efficient method to avoid audfcle 
degradauon during rapidly changing signal segments. Furthermore, alteration of the length of the signal segment 
bemg processed, which would raise higher computational demands, is not required. Also, the present invention 

uuhses a refined codebook search based on the results fiom the preceding sean:h. Tliis means t^ 

ordinanr correlation of two signal segments, as is usually done in time^lomain systems based on pattern search 

prediction, themostlikelysynchronisationpositionsaretriedfirstinstead ^ 
new metiiod for reducing the ood*ook search drastically reduces the computational complexity of the system. 
Further, when using several transposers. synchronisation position information can be shared among the tiamposers 
for finther reduction of the computational complexily. as shown in die foUowing imp l^n^foH^ 

■me time^omain transposers as explained above are used to implement die SBR-1 and SBR-2 systems according to 
the following, illustrative but not limiting, example. In Fig. 8 three time expansion modules are used in order to 
generate second, third and fourti, order harmonics. Since, in this example, each time domain expansion Aransposer 
works on a wideband signal, it is beneficial to adjust the spectral envelope of die source frequency range prior to 
transpodtion. considerir^ U«t there wiU be no means to do so after the transpositions. 

equaliser system. Hie spectral envelope adjusters. 801. 803 and 805, each work on several filteibank chamiels. The 
gam of each chamiel in die envelope adjusters must be set so that the sum, 813. 815. 817, at the output, after 
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liansposiUon, yields the desired spectral envelope. The tiansposers 807, 809 and 81 1 are inteiGoiinected in order to 
share synchronisation position information. ITus is based on the feet that under certain conditions, a high conelation 
wUl occur between the synchronisation positions found in the codebo<A: during correlation in the sepafate 
iransposing units. Assume, as an example and again not limiting the scope of the invention, the fourth order 
harmonic transposer woite on a time frame basis half of thai of the second on^ 
the duty cycle. Assume further, that the oodebooks used for the two expanders are the same and that the 
synchn»nisation positions of the two time^main expanders are labelled sync _pos4 and ^c_pos2, respectively. 
This yields the following lelation: 

sync_pos2=sync_pos4-n-4 S-sync_ojrset , for n=l,2,3.4..., (lO) 

where 

sync _offset= sync _pos4- sync _pos2, for n=0, (H) 

andJis the length of the input segment represented by the codebook. This is valid as long as neither of the 
synchronisation position pointers reaches tiie end of ttie codebook. During normal operation n is increased by one 
for each time-fiame processed by tiie second order liaimonic transposer, and when Oie codebook end inevitably is 
reached, by dtiier of tiie pointers, tiie counter n is set to «=0, and sync_po:il and sync_posA are computed 
individually. Similar results are obtained for ti« tiurd order harmonic transposer when connected to die fourth oider 
harmonic transposer. 

The above-presented use of several interconnected timesiomain transposere. for tiie creation of higher order 
harmonics, introduces substantial computational reduction. Furthermore, tiie proposed use of time-domain 
transposers in comiection witii a suitable fUterbank. presents tiie opportunity to adjust tiie envelope of tiie created 
spectnmi while maintaining die simplicity and low computational cost of a time domain transposer, since Uiese. 
more or less, may be implemented using fixed point aritiimetic and solely additive/subtiactive-opei^tions. 

Other, iUustrative but not limiting, examples of tiie present invention are: 

- tfie use of a time domain transposer witiun each subband in a subband filter bank, ttius reducing tiie signal 
complextQr for each transposer. 

- tiie use of a time domain transposer in combination witii a frequency domain uansposer. tiius enabling die 
system to use different metiiods for transposition depcnduig on the characteristics of Uie input signal being 
processed. 

- *e"«ofatimcdomaintransposerinawidebandsp8echcodec.operatingonforinstancettierea 

obtained after linear piedictioa 

It should be recognised ti«l flie mettiod outiined above may be advantageously used for timescale modification only 
by smiply omitting tiie sample rate conversion Furtlier it is understood, tiiat alfliough Uie outiined metiiod focuses 
on ptch transposing to a higher pitch. i.e. time expansion, tiie same principles apply when transposing to a lower 
pitdi, Le. time compression, as is obvious to those dolled in tiie art 
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Filter bank based transp osition 

Various new and innovative filler bank based transposition techniques will now be described The signal to be 
transposed is divided into a series of BP- or subband signals. TTie subband signals are then transposed, exact or 
approximately, which is advantageously accomplished by a reconnection of analysis- and synthesis subbands. 
hereinafter referred to as a "patch" The method is first demonstrated using a Short Time Fourier Transform, SIFT. 

Hie N-point STFT of a discrete-time signal x(n) is defined by 

CO 

where k = 0,1..../^-! and a.* - 2«^7^and /,(«) is a window. If the window satisf.es the following conditions 

f/r(0) = l 

\h{n) = 0 for n = ±N,±2N,±3N,... 
an inverse transform exists and is given by 



^">=i|^*(«)-^'*"- (14) 



nie direct transform may be interpreted as an analyser, see Fig. 9a. consisting of a bank of A^BP-filters with impulse 
^sponses y,(«)expa^„) 901 followed by a baric of multipliers with carriers exp(-y^«) 903 which shift theBP- 
signals down to regions aromid 0 Hz. forming the N analysis signals J^,(„). TT.e window acts as a prototype LP-filter 
Ak(n) have smaUbandwidths and are normally downsampled 905. Eq. 12 need thus only be evaluated at « = r« 
where«is the decimation fiK^orandristhencw time variable. AKn)can be recover 

see Fig 9b. Le. insertion of zeros 907 followed by LP.fil,erir« 909. TTu. im^rse transform may be interpreted as a 
synthesiser consisting of a bank of AT multipliers with carriers (l/^^a*") 9 11 tiiat shift the signals X^n) up to 
^ongmal frequencies. foUowed by stages 913. Fig. 9c. tiiat add the contributions>.^„) from all channels. Hie 
STFT and ISTFT may be rearranged in order to use the DFT and IDFT. which makes Uie use of FFT algorithms 
possible rimplementation of the Phase Vocoder using tiie Fast Fourier Tansform" M R. Portnott IEEE ASSP 
Vol. 24, No. 3. 1976J. • ' 



Rg. 9c shows a patch 915 for generation of second hamionics. A/= 2, wiU. 32. For die sake of simplicity only 
chamiels0.hm„ghl6ar.showi.IT«oentieftequency 

31 correspond to negative frequencies. Hie blocks denoted P 9 17 and the gain blocks 91 9 wiU be described later and 
Should presently be considered shorted out THe input signal is in tins example bandlimited so ti«t only channels 0 
through 7 contain signak. Analyser channels 8 through 16 are tfius empty and need not be mapped to die 
synthesiser. Analyser chamiels 0 tiuough 7 are connected to synthesiser ch^^^ 

u«>«t Signal delay path. Analysis chamiek * where 4 ^ * ^ 7 are also comtected to synfliesis cha^ ^ 

vluch shiftthe signals to frequency „=gionsat^«, times tiiecentre-frequ^^^^ Hence ti« signals 

are upshifled to theiroriginal ranges as we« as transposed 

terms of r^l-valued filter responses and modulators fl« neg^^ 

branch of Fig. 10a. Hence. fl« combined output of ti« remapping * ^ iW* 1001 and M* MA^ 1003 where 4< * 
^ must be evaluated 
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This yields 

yi") = -J^b^") • *(n)cos(a>t«)]cos({A/ - l)a>tn)) + 

2 r , (^5) 

- — * A(/»)sin(a»in)Jsin(CA^ - Ijo^jt") 

wlKae Jl/ 2. Eq. 1 S may be intopreted as a BP-fllteiing of the input signal. foUowed by a linear frequency shift or 
Upper Side Band (USB) modulation, i.e. single side band modulation using the upper side band, see Fig. 10b, where 
1005 and 1007 fonn a Hilbert transformer, 1009 and 1011 are mulUpUers with cosine and sine canios and 1013 is a 
difference stage whidi selects the upper siddiand. Cleariy, such a multiband BP and SSB method may be 
implemented expUdQy, i.e. without filteibank patching, in the time or fiequency domain, allowing aibitraiy 
selection of individual passiiands and oscillator frequencies. 

According to Bq. 15, a sinusoid with the frequency wi within the passband of analysis channel k yields a harmonic at 
the frequency A/fifc+(a» - aO- Hence the method, referred to as basic multiband tianqx>sitian, only generates exact 
harmonics for input signals with frequencies = where 4 S * ^ 7. However, if die number of fUteis is 
sufficiently large, the deviation from an exact transposition is n^ligible. see Eq. 4. Further, the tianqxisition is 
made exact for quasi-stationary tonal signals of aibitraiy frequencies by inserting die blocks denoted P 917 (Fig. 9c), 
provided every analysis channel contains maximum one partial In Ous case Xt(rR) are complex exponentials witii 
frequencies equal to the differences between die partial frequencies and the centre frequencies cot of Uie analysis 
filters. To obtain the exact transposition M. tiiese frequencies must be increased by a fector M modifying tire above 
frequency relaticmdiip to a% A/afc+A/(fl» - = Ma,. Hie frequencies of JlfiCrR) are equal to the time derivatives 
of tiieir respective unwrapped phase angles and may be estimated using first order differences of successive phase 
angles. The frequency estimates are multijdied by A/and syntiiesis phase angles are calculated using those new 
frequencies. However, die same result, aside from a i*ase constant, is obtained in an simpUfied way by multiplying 
tiie analysis arguments by M direcUy, eliminating the need for frequency estimation. This is described in Fig. 1 1. 
representing the blocks 917. Thus Jr*(rR), where 4 < *^ 7 in tiiis example, are converted from rectangular to polar 
coordinates. iUostrated by tiie blocks R -» P. l lOl. The arguments are multiplied by AY = 2 1 103 and die magnitudes 
are unaltered. The signals are then converted back to rectangular coordinates (P R) 1 105 forming Uie signals 
YtairR) and fed to symhesiser channels accordiag to Fig. 9c. This improved multiband transposition meduxl dnis 
has two stages: The patch provides a coarse transposition, as in Uie basic medwd, and die phase-multipliers provide 
fine frequency corrections. The above multiband Uansposition metiiods differ from traditional pitdi shifting 
techniques using the STFT, where lookup-table oscillators are used for die synUiesis or, when the ISTFT is used for 
the synthesis die signal is time-sUetched and decimated, i.e. no patch is used. 

The harmonic patch of Fig. 9c is easily modified for otiier tranqiosition &ctors tiian two. Fig. 12 shows a patch 1203 
for generation of 3*^ order harmonics, where 1201 are die analysis channels and 1205 are die syntiiesis channels. 
Different harmonic orders may be created simultaneously as shown in Fig. 13, where 2"* and 3"* order harmonics are 
used. Fig. 14 illustiates a non-overlapping combination of 2"*, 3"* and 4* onlcr harmonics. The lowest possilile 
harmonic number is used as high in frequency as possible. Above die upper limit of Uie destination range of 
harmonicM harmonic A/+1 is used. Fig. 15 demonstiates a mediod of mapping all synUiesiser channels (^=64. 
channels 0-32 shown). AU hif^iband channels wiOi non prime-number indices are mapped according to die 
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following relation between source and destiiation channel number: = Mk„^ where A/ is the snallest integer ^ 
thai sausfies the condiUon that lies in the lowband and A*, in the highband. Hence, no synUiesiser channel 
receives signal from more than one analysis channel. Prime-number highband donneis may be mapped to k^, = 1 
or lowband chamiels k,^,„ > I fta, yield good approximaUons of the above relation (Only non-prime number ' 
connecUons with M= 2, 3, 4, 5 are shown in Fig. 15). 

It is also possible to combine ampUtude and phase infonnation from dilferem analyser chamiels. TTie amplitude 
signals lAT^rR)! may be comiected acconling to Fig. 16. whereas the phase signals arg{X^rR)} are comierted 
aecordmg to the principle of Fig. 16. In diis way the lowband frequencies will still be transposed, whereby a 
periodic repetition of the source region emrelope is generated instead of the stretched envelope that results from a 
tiansposiUon according to Eq. 2. Gating or other means may be incon»rated in order to avoid amplification of 
"empty" source chamiels. Fig. 17 iUustrates another application, the generation of sub-haimonics to a highpass 
filtered or bass Umited signal by using connections from higher to lower subbands. When using the above 
transpositions it may be beneficial to employ adaptive switching of patch based on the characteristics of the signal. 

In the above description it was assumed that tire highest frequency contained in Uie input signal was significantly 
lower tium flie Nyqvist frequency. Thus, it was possible to perform a bandwidtii expansion witiiout an increase in 
««nple rate. TTus is however not always flie case, why a preceding upsampUng may be necessaiy. When using filter 
bank metiiods for transposition, it is possible to integrate upsampling in Uie process. 

Most perceptual codecs employ ma>dmally decimated filter banks in the time to frequency mapping Hntioduction 
toPcrccptualCodingr K- Brandenburg. AES. Collected Papers on Digital Audio Bitrate Reduction 1996J Fi& 18a 
shows U« basic structure ofaperceptual encoder system. TTie analysis filter bank 1801 spUts the input signal into 
several subband signals. ITie subband samples are individually quantised 1803, using a reduced number of bits 

wheretiienumberofquantizationlevelsaredeterminedftomapereeptualmodel 1807 which estimates ti,e 
mmimum masking tiueshold. The subband samples are nonnalised. coded wiO. optional redundancy coding metiiods 
and combined witii side information consisting of flie normalisation factors, bit-allocation information ami otiier 
codec specific data 1 805. to form tiie serial bit stream. The bit stream is tiien stored or transmitted In die decoder 
F.g. 18b. the coded bitstream is demultiplexed 1809. decoded and die subband samples are re-quantised to the eq^ 
number of bits 1811. A syntiiesis filter bank combines die subband samples in order to recreate tiie original signal 
1813. Implementations using maximally decimated filter banks will drastically reduce computational costs In the 
following descriptions. U«re is a focus on cosine modulated filter banks. It should be appreciated however tiiat tiie 
invcmion can be implemented using oUier types of filter banks or transforms, including filter bank interpre'tations of 
die wavelet transform, other non^ual bandwidtii filter banks or transforms and multiKlimensional fiher banks or 
transfonns. 



In die lUustrative. but not limiting, descriptions below it is assumed that an cosine modulated filter bank 

sphts flu. input Signal x(.) into L subband signals. The generic stnicture of a maximally decimated filter bank is 
shownmF.fr 19. Tlie analysis fdters are denoted 1901. where* = 0. 1.....L-1. Hie subband signals v^n) are 
maximaUy decimated 1903. each of sampling frequency/^L. where/, is die sampling frequency of x(«) The 
synd«s« section reassembles the subband signals after interpolation 1905 and filtering 1 907 to produce IHe 
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synthesis filters are denoted F^z). hi addition, the present invention performs a spectral replication onx(/i) , giving 
an enhanced signal X")- 

Synthesisiitg the subband signals with a ^/.-channel filter bank, where only the L lowband channels are used and the 
bandwidth expansion fector 6 is diosen so that gz, is an integer value, will result in an output bit stream with 
sampling frequency Qf,. Hence, the extended filter bank wiU act as if it is an /^channel filter bank followed by an 
upsampler. Since, in this case, the UQ-l) highband filters are unused (fed with zeros), the audio bandwidth will not 
change - the filter bank will merely reconstruct an upsampled version of x(«) . It however, the L subband signals 
are patched to the highband fUters, the bandwidth of will be increased by a fiictor Q, producing ^n). This is the 
maximaUy decimated filter bank version of the basic multiband transposer. according to the inventioa Using this 
scheme, the upsampling process is integrated in the synthesis filtering as explained earlier. It should be noted that 
aiiy size of the synthesis filter bank may be used, resulting in different sample-rates of the output signal, and hence 
different bandwidth expansion fectors. Performing spectral repUcation onx(n) according to the present invention of 
the basic multiband tram^tion method with an integer transposition factor A/, is accomplished by patching the 
subband signals as 

»'A»(«) = ejj^(/,)(-l)(*'-i)*-^^(„) ^^^^ 

Where k e {0^1} and chosen so that e lL,QL-\], e^n) is the envelope coirecUon and (-if *^ is a coirecUon 
fector for spectral inverted sobbands. Spectral inversion results fiom decimation of subband signals, and die inverted 
signals may be reinverted by changing sign on every second sample in those channels. Referring to Fig. 20. consider 
an 16-channel synthesis filter bank, patched 2009 for a transposition &ctor Jl/= 2. with g = 2. The blocks 2001 and 
2003 denote the analysis filters H^z) and the decimators of Fig. 19 respecUvely. Similarly, 2005 and 2007 are the 
inteipolators and synthesis filters F^z). Eq. 16 then simplifies to patching of the four upper fi«,uency subband 
signals of the received data into every second of the eight uppermost channels in the synthesis filter bank. Due to 
spectral inversion, every second patched subband signal must be frequency inverted before the synthesis. 
Additionally, the magnitudes of the paldied signate must be adjusted 20 1 1 according to the principles of SBR-1 or 
SBR-2. 



Usmg the basic multiband transposition method according to the present invention, the generated harmonics are in 
general not exact muhiples of the fundamentals. All frequencies but the lowest in every siibband differs in some 
extern from an exact transposiUon. Fuither, the repUcated spectrum contains zeros since the target interval covers a 
wider frequency range than the souree interval Moreover, the alias cancellation properties of the cosine modulated 
filter bank vanishes, since the subband signals are separated in frequency in the target interval. That is. neighbouring 
subband signals do not overlap in the high-band area. However, aliasing reduction methods, known by dK,se skilled 

m the ail, may be used to reduce this type of artifiicts. Advantages of this transposition method are 
iiiq>Iementation, and the very low computational cost 

To achieve perfect transposition of sinusoids, an effective maximally decimated filter bank solution of the improved 
mulUband transposition meUiod is now presented. n.e system uses an additional modified analysis filter bank, while 
the synthesis filter bank is cosine modulated as described by Vaidyanathan ["Multiiate Systems and Filter Banks" 
P. P. Vaidyanadian. Prentice Hall, Englewood Oifife. New Jersey. 1993. ISBN 0-13-605718-71. n>e steps for 
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operaUon. using the improved mulliband transposition method according to the present invention, based 
maximaUy decimated filler banks, are shwvn schematicaUy in Fig. 21 and in the flow-chart of Fig. 22 an 
follows: 



1. TbeL received snbband signals are synthesised with a gL-channel fUter bank 2101. 2201. 2203, where the 
UQ-l) upper channels ate fed with zeros, to form signal x,(n), which is thus oversampied by th^ bandwidth 
expansion &cu>r Q. 

2. x,(n) is downsamided by a fector O, to form signal x^if) 2103, 2205, L e. ^2(11') = x, 

3. An integer-value Jt is chosen as the size of a synthesis filter bank, constrained so that r= KM/Q is an integer, 
where ris the size of the modified analysis filter bank and A/ is the transposition &ctor 2207, 2209. 221 1. AT 
should preferably be chosen large for stationary (tonal) signals, and smaller for dynamic (tnmsient) signals: 

4. is filtered through a r-<Aam.el modified analysis filter bank 2107. 2213, where tte 
exponentially modulated, producing a set of complex-valued subband signals. The subband signals are 
downsampled by a factor T/M, giving subband signals v»«»*>(«"). * - 0. 1,..., T-l. Hence, the filter bank wUI 
be oversampied by a fiictor M 

5. -me samples v*n«") are converted to a polar representation (magnitude and phase-angle). Tbc phase-angles 
are multiplied by the factor M and the samples are converted back to a rectangular representation according 

to the scheme ofFig. 11. The real parts of the complex-valued samples are taken, giving the signals^^^^^^ 
2109, 2215. After this <q>enition. the signals st^^n") are critically sampled. 

6. nie gams of the signals **<^(/,") are adjusted aoconJing to the principles of SBR-l or SBR-2 2111, 2217 

7. Thesubbandsignak*t<«>(«'0.whereA:eir/M.min(A;7)-ll.areqrnd«^ 

modulated JSTKAannd filter bank, where the channels 0 through T/M-l are fed with zeros 2105. 2221. IWs 
produces the signal X3^{n). 

«. x,<^i„} is finally added to x,(i,) to giveX«) 2223. which is the desired spectral repUcated signal 

Steps 3 to 6 may be repeated for different values of the transposition fector M thus adding multiple harmonics to 
x.(n). This mode of operation is fllustrated by the dotted figures of Fig. 21. and in Fig. 22. by iterating the loop over 
boxes 2211 - 2219. In this case. ATis chosen as to make T integer-valued for all choices of JtY- for integer valued 

Ms; preferably select /: as to make iC/g a positive integer. All subband signals4*^'>(„') , where , = 1, 2 and 

<n is the nuniber of transposition factors, are added aocoiding to 

for every apphcable A. In the fin»iterationof the loop ofFig. 22. ti.esignab.rfn'O may be consider 
samples of zeros only, where * = 0. in every loop, the new samples are added 2219 to s^n") as 

where * = K/Q, K/Q.1,..., min(^,7;)-l. THe subband signals s^n") are syntiiesised once witi, a K-channd filter 
bank according to step 7. 
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The modified analysis filter bank of step 4, is derived through the theory of cosine modulated filusr banks, where the 
modulated lapped transform (MLT) ["Lapped Transforms for Efficient Transfonn/Subband Coding" H. S. Malvar, 
IEEE Trans ASSP, vol. 38, no. 6, 1990] is a special case. The impulse responses A^«) of the filtere in a T-channd 
cosine modulated filter bank may be written 

^ in) = C Po(«)oos|^^(2A:+lX/» +<!>* j. (19) 

where *= 0, 1 7-1. is the lenglh of the lowpass prototype filter /io(«), C is a constant and is a phase-angle 

th^ ensures alias cancellation betweoi adjacent channels. The constraints on «!>» is 

= ± J . <Dr_, = ± J and <l)k = <b,_, ±£ (20a-c) 

which may be simplified to the closed form e>q»ession 

«»*=±<-l)*f. (21) 

With this choice of , perfect reconstruction systems or aiqnwdmate reconstnicUon systems (pseudo QMF 
systems) may be obtained using synthesis filter banks with impulse responses as 

ft in) = Cpo in) cos|^^ (2A + l)(/i - ~iy - j . (22) 

Consider the filters 

Ai(«) = C/7o(«)sin|^^(2t+lX/i-^^) + <I>4j. (23) 

where h'^n) are sine^nodulated versions of the prototype mterM")- filters 7f *(z) and H^z) have identical 
passband supports, but the phase responses differ. The passbands of the filters are actually ffilbcrt tiansfonns of 
each other (this is not valid for frequencies close to a.= 0 and a>= 7t). Combining Eq. 19 and Eq. 23 according to 

*fc in) = A* in) +Jh't («) = Cpo (/I) expj^^ (2A + 1)(/t - -^^1) + j<b^^ (24) 

yields filters that have the same shape of the magnitude responses as ft(z) for positive fiequencies but are zero for 
negative ftequencies. Using a filter bank with impulse responses as in Eq. 24 gives a set of siibband signals that may 
be interpreted as the analytic (complex) signals corresponding to the subband signals obtained from a filter bank 
with impulse responses as in Eq. 19. Analytic signals are suitable for manipulation, since the complex-valued 
samples may be wiilten in a polar fonn. that is z(/,) = K«)+y *(«) = K")|exp{/ aig(2(«))}. However, when using die 
complex filter bank for tran^tion. the constraint on <Zi has to be generalised to retain the alias cancellation 
property. The new constraint on <IV , to ensure alias cancellation in combination with a synthesis filter bank with 
im]Milse reqwnses as inEq. 22 is 

<I>t=±(-l)*— 

AM t25) 

which simplifies to Eq. 2 1 when A/ = 1. With this choice, transposed parUals will have the same relative phases as 
they would have when A/- 1 (no tnui^iosition). 



wo 98/57436 



18 



PCT/IB98/00893 



Combining Eq. 24 and Eq. 25 results in 



:") = C;,o(")«Jj<H±i)(„_>^)i(ll)l 
[[27 2 AM 



(26) 



which are the fiheis used in die modified filter bank of step 4, accoiding to the 



present invention. 



Some clarifications concenung step 5: downsampling die complex-valoed subband signals by a factor TIM makes 
d«m oversaxnpled by M which is an essential criterion when die phase-angles subseq«mUy are muhiplied by die 
transposidon factor M The oversampling forces d.e number of subband samples per bandwidd.. after tnmsposiUon 
to die target range, to equal dat of die source range. The individual bamlwidtfis of the transposed subband signals 
are iWr tm.es greater dian diose in die source mnge, due to die phase-multiplier. TTiis makes die subband signals 
cntically sampled after step 5. and additionally, diere wiU be no 2en« in die spectrum when transposing tonal 
signals. 



In Older to avoid trigonometric calculations, diat is. having to compute die new subband signals 



as 



4^*(«') = rea« 



in') 



exp- 



/A/aretar 



iniag{v[^^>(n-)} 



= |v<^>(.-)|cos 



A/arctan 



I Ka{vf^\n')) J 

\ K J 



(27) 



^re K«^(«")| is dK absolute value <^v,^i^'), d« fi,llowing trigonometric reladonship 



is used: 



cos(A/a) = cos^(a)-(J^)sin2(a)cos^^-2(^j^.(M)^„4(„j^A/-4 



(a)-. 



Letting 



and noting that 



\ real{vj 



(28) 



(29) 



cos(a) = cos(arcta4ii^^ 



vrV)}V ieal(v<^)(„')} 



(30) 



and 



sin(a) = siHBicJ^^^^^:±2SfOl\ _ ^g(vr(»')} 
Vrea«vi*0(„')>/~ • 

die computadons of step 5 may be accon^lished widiout trigonometric calculations, reducing 
complexity. 



(31) 



computational 
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When using tran^sitions where M is even, obstacles with the phase-multiplier may arise, depending on the 
characteristics of the lowpass prototype filter /7o(/7), AU applicable prototype filters have zeros on the unit circle in 
the 2-plane. A zero on the unit circle imposes a 180° shift in the phase response of the filter. For A/ even, the phase- 
multiplier translates these shifts to 360*» shifts; i.e. the phase-shifts vanish. The pailials so located in fi^uency that 
such phase-shifts vanish will give rise to aliasing in the synthesised signal. Tlie woret case scenario is when a partial 
is located at a point in frequency corresponding to the top of the first side lobe of an analysis filter. Depending on 
the rejection of this lobe in the magnitude response, the aliasing will be more or less audible. As an example, the 
first ade lobe of the prototype filter used in the ISO/MPEG layer 1 and 2 standaid is rejected 96 dB, whOe the 
rejection is only 23 dB for the first side lobe of the sine-window used in the MDCT scheme of the ISO/MPEG layer 
3 standard. It is clear, that this type of aliasing, using the sine-window, will be audible. A solution to this problem 
will be presented, and is referred to as relative phase locking. 

The filters h'lfyi) all liave linear phase responses. Tlie phase-angles <Pt introduce relative phase differences between 
adjacent channels, and the zeros on the unit circle introduce 180° phase-shii^ at locations in frequency that may 
differ between diannels. By monitoring the phase-diff^ence between neighbouring subband sigiials, before the 
phase-multiplier is activated, it is easy to d^ect the channels that contain phase-inverted infonnation. Considering 
tonal signals, the phase-difference is approximately nHM, according to Eq. 25, for non-inverted signals, and 
consequcnUy approximately t^I-VTM) for signals, where either of the signals is inverted. The detection of inverted 
signals may be accomplished simply by computing the dot product of samples in adjacent subbands as 

If the product in Eq. 32 is negative, the phase-difference is greater than 90°, and a phase-inveision condition is 
present The phase-angles of the complex-valued subband signals are muitipUed by A/, according to the schrane of 
siq) 5, and finally, the inversion-tagged signals are negated The relative phase loddng method thus forces the 180° 
shifted subband signals to retain this shift after the phase-multiplication, and haice maintain the aliasing 
cancellation properties. 

Spectral envelope adjustment 

Most sounds, like speech and music, are characterised as products of slowly vaiying envelopes and rapidly vaiying 
carriers with constant amplitude, as described by Stockham TTlie Application of Generalized Linearity to 
Automatic Gain Control" T.G. Stockham, Jr, IEEE Trans, on Audio andElectroacoustics, Vol. AU-16, No. 2, June 
1968) and Eq. 1. 



In spht-band perceptual audio coders, the audio signal is segmented into fiames and split into multiple frequracy 
bands using subband filters or a time-to-frequency domain transform. In most codec types, the signal is 
subsequently sqiarated into two major signal components for transmission or storage, the spectral envelope 
representation and the normalised subband sanqiles or coefficients. Throughout the following description, the torn 
"subband samples" or "coefficients" refers to sample values obtained fiom subband filters as well as coefficients 
obtained from a time-to-firequency transform. The term "spectral envelope" or "scale factors" represent values of the 
subbands on a time-frame basis, such as the average or maximum magnitude in each subband, used for 
noraialisaiion of the subband samples. However, the spectral envelope may also be obtained using linear prediction 



wo 98/57436 



20 



PCT/1B98/00893 



LPC. fU.S. Pat 5.684,9201. la a typical codec. &e nonnalised subbami samples ,«qui« ceding at a high biuate 
(using approximately 90% of the available bitrate). compa.^ to the slowly varying tempo^l envelopes, and the 
spectnd envelopes, that may be coded at a much^uced nue (using approximately 1 Qo/. of the available bitiate). 

Aoauate spectral envdopectftte 

to be preserved m pen«ived tin^re of a musical instnm^nt. or voic^^ 
du^ributionbelowaftcquency/^ located in thehighestoctav^ 

of less importance, and consequently the highband fine structures obtained by the above transposition methods 

squire no adjustment, while the coarse structures generally do. ^ 

filter the spectral representation of the signal to separate the em^dope c^ 

^the SBR-1 implementation according to the present invention, the highband coarse spectral envelope is estimated 
fiomthelcnvbarulinformationavai^^^ 

envelope of the lowband and adjusting the highband spectral envelope according to specific rules. A novel mefluxl 

to accomplish the envelope estimation uses asymptotes in a logaritiunic fiequency-magnitude space, which is 

equivalent to curve fitting withpolynomials of varying orderin the linear s^^ 

poruon of U« lowband spectnm. are estimated, and the estimates are used to defme the levd 

several segmentsrepresentingtiienew highband envelope. The asymptote intersections are fi«d in fiequcncy and 

act as pivot points. However not always necessary, it is beneficial to stipulate constraints to keep the highband 

em^ope excursions witiun realistic boundaries. An alternative approach to estimation of ti.e spectial em^elope is to 

use vector quantization. VQ. of a brge mm*er „f representative spectral envelopes, and store tirese in a looJTp- 

Ubleor c^ieboot Vec^rquantizationisperforrnedby trainu^^ 

tiammg data, m Ous case audio spectral envelopes. trainir^ is usuaUy done wiU. the Ge.«ralised Uoyd 
t^!T!T. ^l^T "^"^ Compression" A. Gersho. R. M Gray. Kluw^ Academic Publishers. 

USA l'^2.ISBN0-7923-9l81-0,.andyieldsveaorsUutoptimaUycoverthecontentsofU^,ra^ 
ConsKlering a VQ codebook consisting of A spectral envelopes trained by B envelopes (B » A) then the A 
envd<^ represent the A most likely uansitions 

^tJ^h.^^T'^"^'"'"'*^'^'^^^ 
^to^chOrecodebookandthehighbandpartofti.ebest«atching^^^^ 
ni gnb and spectnim. 



10|rt««ds«l*and samples 240, ..e .n,,^ 2«, 

mamtaimng a agmficant bit rate reduction. 
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In some codecs, it is possible to tnmsmit tlie scalefactors for the entire spectral envelope whUe omitting the 
highband subband samples, as shown in Fig. 24. Other codec standards stipulate that scalefactors and sabband 
samples must cover the same frequency range. i.e. scale-fectors cannot be transmitted if Uie subband samples are 
omitted. In such cases, there are several soluUons; the highband spectral envelope informaUon can be transmitted in 
separate fiames, where the frames have their own headers and optional eiror protection, followed by the data 
Regular decoders, not taking advantage of the present invention, wiU not recognise the headers and therefore discard 
the extra fiames. In a second solution, the highband spectral envelope information is transmitted as auxiliary data 
within the encoded bitstceam. However, the available auxiliary data field must be large enough to hold the envelope 
informatioa In cases where none of the first two solutions are adaptable, a third solution, where the highband 
spectral envelope informaUon is hidden as subband samples, may be appUed. Subband scalefactors cover a large 
dynamic range, typically exceeding 100 dB. It is thus possible to set an arbittary number of subband scalefiictors, 
2505 in Fig. 25, to very low values, and to transmit the highband scalefactors "camoufiaged" as subband samples. 
2501. Hus way of transmitting the highband scale fiictors to the decoder 2503 ensures compatibUity with the 
Utstream syntax. Hence, arbitrary data may be transmitted in this fashioa A related method exists where 
information is coded into the subband sample stream [U. S. Pat 5,687,191J. A fourth solution. Fig. 26. can be 
appUed when a coding system uses Huffinan- or other redundancy coding 2603. The subband samples for the 
highband is tiien set to zoo 2601 or a constant value as to achieve a high redundancy. 

Transient response improvements 

Transient related artifects are common problems in audio codecs, and similar artifacts occur in the presem invention. 
In general, patchmg generates spectral "zeros" or notches, conesponding to time domain pre- and post-echoes. Le. 
spurious transients before and after "true" transients. Albeit Uie P^Hocks "fill in the zeros" for slowly vaiying tonal 
signals, the pre- and post^echoes remaia Hie improved multiband metiiod is intended to work on discrete sinusoids, 
where the number of sinusoids is restricted to one per subband. Transients or noise in a subband can be viewed as a 
large nmnber of discrete sinusoids within that subband. TTiis generates intermodulation distortion. TTiese artifiicts are 
considered as additional quantization-noise sources connected to tiie repUcated highband channels during transient 
inteivals. Traditional metiiods to avoid pre- and post-echo artifects in perceptual audio coders, for example adaptive 
window switching, may lumce be used to enhance the subjective quaUty of Uie improved multiband method. By 
using the transient detection provided hy the codec or a separate detector and reducing the number of chamiels m«ier 
transiem conditions the "quamization noise" is forced not to exceed the time-dependent masking threshold. A 
smaUernumberof channels is used during Uansient passages whereas a torger is used during tonal passages. Such 
adaptive window switching is commonly used in codecs in order to trade frequency resolution for time resolution. 
Differem methods may be used in applications where the filteibank size is fixed. One approach is to shape the 
"quantization noise" in time via linear prediction in the spectral domain. Hie transposition is then performed on the 
residual signal, which is the output of tiie linear prediction filter. Subsequenfly. an inverse prediction filter is applied 
to the onginal- and spectral replfcated chamtels simultaneously. Another approach employs a compander system i e 
dynamic amplitude compression of the transient signal prior to transposition or coding, and a complementary 
expansion after transposition. It is also possible to switch between transposition methods in a signal dependent 
numner. for example, a high resolution filteibank transposition method is used for stationary signals, and a time- 
variant pattern search prediction method is employed for transient signals. 
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Practical implementations 

Usinga standard signal-processor or a powerful PC. real4une operation of a SBR-enl>anccd codec is possible. The 
SBR enhanced codec may also be haxd^coded on a custom chip. It may also be implemented in various kinds of 
systems for storage or transmission of signals, analogue or digital, using aibitraiy codecs. Fig. 27 and Fig. 28. The 
SBR-l method may be integrated in a decoder or supplied as an add^n hardware or software posl-pmcessing 
module. The SBR-2 method needs additional modification of Uie encoder. In Fig. 27 the analogue input signal is fed 
to the A/D-converter 2701, forming a digital signal which is fed to the an arbitrary encoder 2703. where source 
coding is performed. The signal fed into the system may be of such a low-pass type that spectral bands within the 
auditory range already have been discarded, or spectral bands are discarded in the arbitrary encoder. The resulting 
lowband signals are fed to the multiplexer 2705. forming a serial bitstream which is transmitted or stored 2707 The 
de-multiplexer 2709 restores ti»e signals and feeds them to an arbitrary decoder 27 1 1 . Tire spectral envelope 
information 2715 is estimated at the decoder 2713 and fed to the SBR-l unit 2713 which tiansposes the lowband 
srgnal to a highband signal and creates an envelope adjusted wideband signal. FinaUy, the digital wideband signal is 
converted 27 17 to an analogue output signal. 

THe SBR.2 method needs additional modification of the encoder. In Fig. 28 the analogue input signal is fed to tiie 
A/D-converter 280 1. forming a digital signal wWch is fed to Uie an arbitrary encoder 2803. where source coding is 
performed. Tho spectral envelope information is extracted 2805. The resulting signals, lowband subband samples or 
coeffiaents and wideband envelope information, are fed to the multiplexer 2807. forming a serial bitstream which is 
transmitted or stored 2809. TTre de-multiplexer 2811 restores the signals, lowband subband samples or coefficients 
and wideband envelope information, and feeds them to an arbitrary decoder 2815. The spectial envelope 
information 2813 is fed from die de-multiplexer 281 1 to the SBR-2 umt 2817 which tiansposes the lowband signal 
to a highband signal and creates an envelope adjusted wideband signal. Finally, the digital wideband signal is 
converted 2819 to an analogue output signal. 

When only very low biUates are available. (Internet and slow telephone modems. AM-broadcasting etc ) mono 
coding of the audio program material is unavoidable. In order to improve the perceived quality and make tire 
programme more pleasant sounding, a simple "pseudo-stereo" generator. Fig. 29. is obtained by the introduction of a 
tapped delayUne 2901. nUs may feed 10ms and 15ms delayed signals at approximately -6dB 2903 to each output 
chamiel m addition to tire original mono signal 2905. ITie pseudo-stereo generator offers a valuable perceptual 
improvement at a low computational cost. 



The above-described embodiments are merely illustrative for the principles of tire presem invention for audio source 
coding unprovement It is understood tiot modifications and variations of the arrangements and the details described 
herem will be apparem to otiiers skiUed in ti»e art It is the intent, tirerefore. to be limited only by the scope of tire 
•nrpendmg patent claims and not by ihc specific details presented by way of description and cxpla«.tion of tiie 
embodiments hereia 



